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IN VIVO SELECTION OF RNA-BINDING PEPTIDES 

GOVERNMENT INTEREST 
5 This invention was made with Government support 

under Grant No. GM47478, awarded by the National 
Institutes of Health. The Government has certain rights 
in this invention. 

10 TECHNICAL FIELD 

The invention applies the technical fields of 
combinatorial chemistry and molecular genetics to the 
isolation of RNA binding polypeptides. The RNA binding 
polypeptides are used in, e.g., therapy and diagnosis of 

15 pathogenic microorganisms. 

BACKGROUND 

RNA-binding proteins are known to fulfill a large 
number of diverse roles in different organisms. See 

20 generally Burd & Dreyfuss, Science 265, 615-621 (1994) . 
In RNA viruses, such as HIV, RNA-binding proteins play 
essential roles in expression of viral genes. For 
example, HIV encodes two RNA binding proteins, termed Tat 
and Rev. Rev is a regulatory RNA binding protein that 

25 facilitates the export of unspliced HIV pre mRNA from the 
nucleus. Malim et al . , Nature 338, 254 (1989). Tat is 
thought to be a transcriptional activator that functions 
by binding a recognition sequence in 5' flanking mRNA. 
Karn & Graeble, Trends Genet. 8, 365 (1992) . Therapeutic 

3 0 compositions for inhibiting the interactions of Rev with 
its target are discussed by commonly owned co-pending 
application USSN 08/071,811 (incorporated by reference in 
its entirety for all purposes) . In bacteria, RNA binding 
proteins are known, inter alia, to have a role in 

35 activation of genes encoding ribosomal proteins. Li et 
al., Cell 38, 851-860 (1984). In mammalian cells, 
mutations in RNA binding proteins have been correlated 
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with several defects or diseases. For example defects in 
genes encoding RNA binding proteins have been reported to 
result in azoospermia (Baker, Nature 340, 521 (1989)) and 
fragile X mental retardation syndrome (Siomi et al . , 
5 Science 282, 563 (1*993)). Mutations in several RNA 
binding proteins have been reported to cause 
developmental defects in Drosophila. Robinow et al . , 
Science 242, 1570 (1988). 

In recent years, there have been several 

10 developments in methods of isolating polypeptides having 
a desired binding specificity. These methods include the 
phage display technique in which polypeptides are 
displayed as a coat protein from a bacteriophage and 
screened against an immobilized receptor. See, e.g., 

15 Dower et al . , WO 91/17271; McCafferty et al . , WO 
92/01047.; Ladner, US 5,223,409 (incorporated by 
reference in their entirety for all purposes) . The 
method permits mass screening of large libraries of 
polypeptides. However, applications of the method have 

2 0 largely been confined to screening protein-protein 
binding. Other mass screening methods have been 

developed for isolating nucleic acid sequences that bind 
to proteins. See Gold et al . , U.S. 5,270,163. Both the 
phage-display method and Gold's method screen binding in 

25 vitro. Ladner et al . , US 5,096,815 and US 5,198,346 have 
proposed in vivo methods for screening DNA binding 
proteins . 

There have also been a number of studies 
investigating the structure and function of RNA binding 

30 proteins. Selby & Peterlin, Cell 62, 769-776 (1990) and 
Venkatesan et al . , J. Virol. 66, 7469-74 80 (1992) have 
reported that a eucaryotic transcriptional activator is 
functional when bound to RNA via a foreign peptide. 
Stripeke et al . , Mol . Cell. Biol. 14, 5898-5908 (1994) 

35 have reported that insertion of a binding site for a 
phage coat protein 5' to a eucaryotic mRNA results in 
suppression of translation in the presence of the coat 
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protein. Franklin, J. Mol . Biol. 231, 343-360 (1993) has 
discussed methods of screening for variants of the phage 
X N antiterminator protein, and reported an arginine rich 
domain at ' the N-terminus of the protein in which 
5 mutations impair antiterminator function. MacWilliams et 
al., Nucleic Acids. Res. 21, 5754-5760 (1993) have 
discussed a modified P22 phage in which the ribosome 
binding site of the ant gene (responsible for the lytic 
state) was replaced with the RNA binding site of a 

10 different phage R17 protein. Mutants of the RNA binding 
site were screened by propagating the hybrid phage in 
cells expressing an R17 translational inhibitor with 
affinity for the ribosome binding site and determining 
the relative numbers of lysogenic to lytic phage. 

15 Notwithstanding these developments, there remains a 

need for efficient methods of large-scale screening of 
RNA binding proteins in vivo. The present invention 
fulfills this and other needs. 

20 SUMMARY OF THE INVENTION 

The invention provides methods and kits for 
screening one one or more polypeptides for specific 
binding affinity for a selected RNA recognition sequence. 
Some methods screen a plurality of polypeptides with 

25 potential RNA binding activity for binding to a selected 
RNA recognition sequence. In these methods, a library of 
cells is cultured. Each cell in the library comprises 
first and second DNA segments, which may be present on 
the same or different vectors. The first DNA segment 

30 supplies the polypeptides to be tested. The first DNA 
segment thus encodes a fusion protein comprising a 
fragment of an ant i- terminator protein having anti- 
terminator activity linked in- frame to a polypeptide 
under test' which varies between cells in the library. 

3 5 The second DNA segment supplies the reporter system. The 
second DNA segment encodes, in operable linkage, a 
promoter, an RNA recognition sequence foreign to the 
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anti-terminator protein, a transcription termination site 
and a reporter gene. The termination site blocks 
transcription of the reporter gene in the absence of a 
protein with anti-termination activity and affinity for 
5 the RNA recognition sequence. The first DNA segment is 
expressed to yield the fusion protein, which, if the 
polypeptide under test has a specific affinity for the 
recognition sequence, binds via the polypeptide to the 
RNA recognition sequence of a transcript from the second 

10 DNA segment thereby inducing transcription of the second 
DNA segment to proceed through the termination site to 
the reporter gene resulting in expression of the reporter 
gene. Expression of the reporter gene is detected in a 
cell from the library. The expression indicates that the 

15 cell comprises a polypeptide having RNA binding activity. 

Often the library of cells are procaryotic cells, 
preferably E. coli. Often the ant i terminator protein is 
a phage antiterminator protein, such as the phage X N 
protein. In such case, the second DNA segment usually 

20 also encodes a Box A sequence. The Box A sequence 
interacts with a host elongation factor stimulating 
antitermination activity of the fusion protein. 

In some methods the polypeptides being screened are 
random polypeptides. In some methods, the polypeptides 

25 are variants of naturally occuring polypeptide such as 
the HIV Rev protein. In other methods, the polypeptides 
are naturally occurring polypeptides from a cDNA or 
genomic library. The number of polypeptides to be 
screened can be quite large (e.gr., about 10 8 ) . 

3 0 The invention further provides methods for screening 

a library of RNA fragments for binding activity to a 
selected polypeptide. These methods are analogous to the 
methods of screening polypeptides, except that the 
polypeptide is kept constant and the RNA molecules are 

35 varied. In these methods, a library of cells is 
cultured. Each cell comprises first and second DNA 
segments, which may be present on the same or separate 
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vectors. The first DNA segment encodes a fusion protein 
comprising a fragment of a procaryotic anti-terminator 
protein having ant i- terminator activity linked in- frame 
to a selected polypeptide. The second DNA segment 
5 encodes, in operable linkage, a promoter, an RNA sequence 
varying between different cells in the library, a 
termination site and a reporter gene, wherein the 
termination site blocks transcription of the reporter 
gene unless the RNA sequence has a specific affinity for 

10 the selected polypeptide. The first DNA segment is 
expressed to yield the fusion protein, which, if the RNA 
sequence has a specific affinity for the selected 
polypeptide, binds via the selected polypeptide to the 
RNA sequence of a transcript from the second DNA segment 

15 thereby inducing transcription of the second DNA segment 
to proceed through the termination site to the reporter 
gene resulting in expression of the reporter gene. 
Expression of the reporter gene in a cell from the 
library is detected indicating that the cell comprises an 

20 RNA sequence having affinity for the polypeptide. The 
cell may then be isolated. 

In another aspect the invention provides kits for 
screening polypeptides for binding to an RNA molecule (or 
vice versa) . The kits comprise recombinant DNA segments 

25 incorporated in one or more vectors, as described above. 



' BRIEF DESCRIPTION OF THE FIGURES 
Fig. 1A: Arginine-rich peptides and their specific 
30 RNA-binding sites: HIV Rev 34 _ 47 (E47->R) and RRE IIB (top 
left) ; X N 1-19 and box B (top right) ; BIV Tat 68 _ 81 and 
BIV TAR (bottom left) ; HIV Tat 49 _ 57 and HIV TAR (bottom 
right) . Amino acids important for binding are indicated 
in bold and binding sites in the RNAs are boxed. 
35 Important amino acids in X N are tentatively assigned 
from mutagenesis of the intact protein (Franklin, supra) . 
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Fig. IB: The genetic code viewed from the 
perspective of arginine-rich peptides. Amino acids 
important for binding in the four peptide model systems 
are indicated in bold. A restricted genetic code (bold 
5 box) encodes all charged and hydrophilic residues, 
glycine, and proline, and contains all six arginine 
codons . 

Fig. 2: Effect of amino acid context and peptide 
length on HIV Rev and BIV Tat peptide activities. Amino 

10 acids 1-19 of the X N protein were replaced with the 
peptides shown. The Rev 34 _ 47 peptides contain a 

substitution of glutamic acid 47 to arginine, included to 
maintain the overall charge of the peptide. 
Antitermination assays were performed with corresponding 

15 RRE and BIV TAR reporters. 

Fig. 3: Antitermination activities of BIV Tat I 79 
mutants on the BIV TAR reporter. /3-galactosidase 
activities determined by the ONPG assay are expressed as 
percent of wild- type BIV Tat peptide activity and are 

20 plotted next to activities previously determined in HeLa 
cells using an HIV LTR-CAT reporter. Activities 
determined by X-gal assays also are shown (+,-). 

Fig. 4A: RNA-binding gel shift assay of selected 
peptides. Synthetic Rev 34 _ 47 (S 34 ) or selected peptides 

25 were bound to wild-type or mutant RRE IIB RNA hairpins at 
the peptide concentrations indicated (nM) . 

Fig. 4B: Circular dichroism spectra of Rev 34 . 40 (■) , 
Rev 34 _ 47 (S 34 ) (O) , Rev-like peptide clone 24 (□) , clone 57 
peptide (•) , and clone 41 peptide (0) . 

3 0 Fig. 5: Exemplary vectors for screening RNA binding 

polypeptides. Polypeptides are cloned between the Ncol 
and BsmI sites of pBRN* in-phase with a fragment of the 
phage X N protein. The plasmid expresses the polypeptide 
as a fusion protein from the tac promoter. The pACN- 

3 5 Tester plasmid encodes (clockwise) a promoter, a box A 
site, an RNA recognition site, three termination sites 
and a lacZ reporter gene. The fusion protein encoded by 
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the pBRN* vector binds via the polypeptide moiety to the 
RNA recognition site thereby allowing transcription to 
proceed through the three termination sites to the lacZ 
reporter gene . 

5 

DEFINITIONS 

A specific binding affinity of an RNA binding 
polypeptide for an RNA binding site refers to a 
dissociation constant <s 10 /xM, preferably <s 100 nM and 

10 most preferably <; 10 nM, and the capacity to bind one (or 
more) RNA binding sites more strongly (i.e., at least 5- 
fold, 10-fold, 100-fold or 1000-fold) than others. 
Dissociation constants as low as 1 nM, 1 pM or 1 fM are 
possible for protein-RNA binding. 

15 A DNA segment is operably linked when placed into a 

functional relationship with another DNA segment. For 
example, a promoter is operably linked to a coding 
sequence if it stimulates the transcription of the 
sequence. Generally, DNA sequences that are operably 

20 linked are contiguous, and in the case of two amino acid 
coding sequences, both contiguous and in reading phase. 
Linking is accomplished by ligation at convenient 
restriction sites or at adapters or linkers inserted in 
lieu thereof. 

25 Peptide or polypeptide refers to a polymer in which 

the monomers typically are alpha- (L) -amino acids joined 
together through amide bonds. Peptides are at least two 
and usually three or more amino acid monomers long. 
Standard abbreviations for amino acids are used (see 

30 Stryer, Biochemistry (3rd ed., 1988) incorporated by 
reference in its entirety for all purpsoses) . The term 
protein is used to refer to a full-length natural 
polypeptide or a synthetic polypeptide that is 
sufficiently long to have a self-sustaining secondary 

35 structure {e.g. , of-helix or j8-pleated sheet) and at least 
one functional domain. 
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Random peptide refers to an oligomer composed of two 
or more amino acid monomers and constructed by a means 
with which one does not entirely preselect the complete 
sequence of a particular oligomer. 
5 A random peptide library refers not only to a set of 

recombinant DNA vectors (also called recombinants) that 
encodes a set of random peptides, but also to the set of 
random peptides encoded by those vectors, as well as the 
set of fusion proteins containing those random peptides. 

10 Random peptide libraries frequently contain as many as 
10 6 to 10 12 different compounds. 

The lefthand direction of a polypeptide is the amino 
terminal direction and the righthand direction is the 
carboxy- terminal direction, in accordance with 

15 convention. Similarly, unless specified otherwise, the 
lefthand end of single -stranded polynucleotide sequences 
is the 5' end; the lefthand direction of double -stranded 
polynucleotide sequences is referred to as the 5' 
direction. The direction of 5' to 3' addition of nascent 

20 RNA transcripts is referred to as the transcription 
direction; sequence regions on the DNA strand which are 
5 ' the RNA transcript are referred to as "upstream 
sequences"; sequence regions on the DNA strand which are 
3' to the RNA transcript are referred to as "downstream 

25 sequences." 

A variant of a natural polypeptide usually exhibits 
at least 20%, and more usually at least 50%, sequence 
similarity to the natural polypeptide. The term sequence 
similarity means peptides have identical or similar amino 

30 acids (i.e., conservative substitutions) at corresponding 
positions . . 

The polypeptides of the present invention are 
obtained in a substantially pure form, typically being at 
least 50% weight/weight (w/w) or higher purity, and being 
35 substantially free of interfering proteins and 
contaminants, such as those which may result from 
expression in cultured cells. Preferably, the peptides 
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are purified to at least 80% w/w purity, more preferably 
to at least 95% w/w purity. For use in pharmaceutical 
compositions, the polypeptide purity should be very high, 
typically being at least 99% w/w purity, and preferably 
5 being higher. 

DETAILED DISCLOSURE 

I . General 

The invention provides methods of screening for RNA 
binding proteins that have desirable binding 
characteristics to selected RNA sequences. The methods 
can be used to isolate variants of known RNA binding 
proteins having altered (usually strengthened) binding 
characteristics. The methods are also useful for 
isolating hitherto unknown RNA binding proteins to any 
RNA sequence of interest. The unknown RNA binding 
proteins may be natural proteins encoded by cDNA or 
genomic libraries or synthetic peptides selected from a 
random combinatorial library. The methods can also be 
applied to screening a library of RNA recognition 
sequences to an RNA binding protein of interest . 

II. The Screening System 
The screening system has two recombinant DNA 

25 components. A first DNA segment encodes the polypeptide 
to be screened for RNA binding activity, and the second 
segment encodes a reporter system to detect the presence 
of such activity. In the first DNA segment the 
polypeptide to be screened is fused in- frame to a 

30 fragment of an ant i- terminator protein such that the 
combined coding sequence is operably linked to a 
promoter. The promoter should be compatible with the 
cell type in which screening is to be performed. 
Suitable promoters for use in the preferred cell-type, E. 

35 coli, include tac, trp, lac, T3 or T7 . Anti- terminator 
proteins include the N proteins of phages X, 21 and P22, 
which have been completely sequenced. See Franklin, J". 



15 
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Mol. Biol. 181, 85-91 (1985); Lazinski et al . , Cell 59, 
207-218 (1989) . 

The ant i- terminator proteins of phages X, 21 and P22 
contain an arginine-rich domain corresponding to about 
5 amino acids 1-19 at the N-terminus of the protein. This 
domain is responsible for RNA binding activity of these 
antiterminator proteins, while the remainder of each 
protein confers anti -terminator activity. In the present 
invention, the RNA binding domain of the anti- terminator 

10 protein is usually deleted and replaced with the 
polypeptide to be screened. Thus, the polypeptide being 
screened is fused to a fragment of an antiterminator 
protein that retains antiterminator activity but usually 
lacks endogenous RNA binding activity. 

15 It is not necessary that the fragment of the 

antiterminator protein be the minimum domain responsible 
for antiterminator activity. Although the natural RNA 
binding domain of the antiterminator protein is usually 
completely or partly deleted, such is probably not 

20 essential. Thus, for example, the sequence encoding the 
RNA binding protein to be screened may also be arranged 
in tandem with the endogenous RNA binding domain of the 
antiterminator protein. The fusion polypeptide usually 
comprises from N-terminus to C- terminus, the polypeptide 

25 being screened followed by the antiterminator domain. 
However, the components may also be assembled in other 
operable combinations . 

In some arrangements, the polypeptide being screened 
is also fused to a linker or spacer polypeptide. The 

3 0 linker can be inserted between the polypeptide being 
screened and the antiterminator protein or on the side of 
the polypeptide distal to the antiterminator protein. A 
linker (or spacer) refers to a molecule or group of 
molecules that connects two molecules or two parts of a 

35 single molecule. A linker serves to place the two 
molecules in a preferred configuration, e.g., so that 
each domain is functional without steric hindrance from 
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the other. The spacer can be as short as one residue or 
as many as five to ten to up to about 100 residues. The 
spacer residues may be somewhat flexible, comprising 
polyglycine, or (Gly 3 Ser) 4 for example. Alternatively, 
5 rigid spacers can be formed predominantly from Pro and 
Gly residues. Hydrophilic spacers, made up of charged 
and/or uncharged hydrophilic amino acids {e.g., Thr, His, 
Asn, Gin, Arg, Glu, Asp, Met, Lys, etc.), or hydrophobic 
spacers made up of hydrophobic amino acids (e.g., Phe, 

10 Leu, lie, Gly, Val, Ala) can be used to present the RNA 
binding site with a variety of local environments. 

In all of these arrangements, the first recombinant 
DNA segment expresses a fusion protein in which one 
component is the polypeptide to be tested for RNA binding 

15 activity and the second component has antiterminator 
activity. 

The second recombinant DNA segment containing the 
reporter system has at least four components in operable 
linkage. The components are a promoter, an RNA binding 

20 site, a transcription termination site and a reporter 
gene. A box A site may also be present. Virtually any 
promoter functional in the cell type being screened can 
be used. Preferred promoters are the same as those 
listed above for the first DNA segment. 

25 The reporter gene can be any gene that confers a 

selectable or screenable property when it is expressed. 
Suitable reporter genes include the /3-galactosidase gene, 
antibiotic resistance genes, such as CAT or AMP, and 
genes having a fluorescent expression product such as the 

3 0 green fluorescent protein gene. 

The choice of termination site for the second DNA 
segment is not usually critical . Termination sites are 
RNA sequences of 50-10 0 bases downstream from the 
translational stop site of a protein coding sequences. 

3 5 Frequently, RNA termination sites can fold to a hairpin 
structure. Termination sites are recognized by RNA 
polymerase as a signal to cease transcription. See Von 
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Hippel, Science 255, 809 (1992) . In eucaryotic cells, 
the selection of termination site depends on the promoter 
to which the reporter gene is linked. However, in 
procaryotic cells, an antitermination protein recognizes 
5 virtually any procaryotic termination site, so the choice 
of termination site is not critical. In some vectors, 
multiple termination sites are included in tandem. 

A further component of the second DNA segment 
constituting the reporter system is a DNA sequence 

10 encoding a known or potential RNA binding sequence. The 
RNA binding site is usually foreign to the 
antitermination protein (or fragment thereof) encoded by 
the first DNA segment. That is, the RNA binding site is 
not naturally bound by the the antitermination protein 

15 (or fragment thereof) . In other words, if the 

antermination protein is the phage X N protein, an RNA 
binding site other than the Box B sequence in Nu tR or 
NutL is present. The Box B sequence of NutLi or MitR may 
or may not be removed from the second DNA segment . 

2 0 Preferably, the Box B sequence is removed and replaced by 
the foreign RNA binding site. Usually, the foreign RNA 
binding bind lacks a specific affinity for the 
antitermination protein (or fragment thereof) encoded by 
the first DNA segment. The use of a foreign RNA binding 

25 site ensures that binding of the fusion protein to the 
RNA binding site occurs via the polypeptide moiety being 
screened rather than an endogenous RNA binding domain of 
the ant i terminator protein. The possibility of binding 
through the endogenous RNA binding domain of the 

30 ant i terminator protein can also be eliminated by deleting 
this domain. 

Sometimes, (e.g., when the antiterminator protein is 
a phage protein) the second recombinant DNA segment 
includes a Box A site as an additional component. Box A 
35 is a conserved sequence originally defined as component 
of the Nut sequences present in phages X, 21 and P22. 
Box A sequences also exist in a variety of antiterminated 
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operons including the ribosomal RNA operons of E. coli. 
Friedman & Olson, Cell 34, 143-149 (1983); Li et al . , 
Cell 38, 851-860 (1984) . The complete sequences of Nut 
sites including Box A and Box B domains from phages X, 21 
5 and P22 are given by Lazinski, Cell 59, 207-218 (1989) 
(incorporated by reference in its entirety for all 
purposes) . Box B is responsible for binding an 

antiterminator protein. Box A, a sequence of 8-12 
nucleotides, which is proximal to the promoter in a 

10 natural operon, is responsible for binding a host 
elongation factor that interacts with the antitermination 
protein (or in the present methods the fusion protein 
having antitermination activity) to stimulate 
antitermination activity. See Greenblatt et al . , Nature 

15 364, 401-406 (1993) (incorporated by reference in its 
entirety for all purposes) . The Box A domain should 
preferably match the antiterminator protein (or fragment 
thereof) encoded by the first segment. Thus, if the 
antiterminator protein is the phage X N protein, one may 

2 0 choose a X Nu£L or NutR Box A sequence, which differ 

slightly in nucleotide sequence. Analogously, if the 
termination protein is a phage P22 N protein, one may 
choose a P22 Nut Box A sequence. 

The promoter and reporter gene are linked to achieve 
25 expression with the promoter upstream from the reporter. 
The RNA binding site and termination site are usually 
between the promoter and the reporter gene, with the RNA 
binding site proximal to the promoter. In some 

arrangements, Box A is present, usually between the 

3 0 promoter and the termination site. In a natural 

antitermination system, a Box A site is juxtaposed by a 
Box B site {e.g., in phage Nut sites). In the present 
reporter system, the Box B site may or may not be 
present. Preferably, Box B is absent and replaced by the 
35 foreign RNA binding site. The exact spacing of the 
components of the reporter system is not thought to be 
critical . All that is required is that the termination 
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site blocks expression of the reporter gene from the 
promoter in the absence of a protein with antitermination 
activity and a specific binding affinity for the RNA 
recognition sequence. 
5 The two DNA segments can be contained in the same or 

separate DNA vectors. If separate vectors are used, the 
two vectors should have compatible origins of 
replication. Vectors can be introduced into cells by 
chemical transformation or electroporat ion . 

10 Electroporation is preferred for generation of large 
combinatorial libraries. Either a eucaryotic or 

procaryotic host cell line can be used. Any strain of 
bacteria {e.g., streptomyces , bacillus) compatible with 
the selected vectors is suitable. However, standard 

15 laboratory strains of E. coli are preferred because of 
the higher transformation efficiencies obtainable. If 
two vectors are used, the vectors preferably contain 
different antibiotic resistance genes, allowing selection 
for cells maintaining both vectors. The antibiotic 

2 0 resistance genes used to ensure maintenance of plasmids 

should be different from any antibiotic resistance gene 
used as a reporter. Exemplary vectors are shown in Fig. 
5. 

After introduction of the two recombinant DNA 
25 segments into a cell, either on the same or separate 
vectors, the selection works as follows. The first DNA 
segment is expressed to yield a fusion protein. The 
fusion protein comprises a polypeptide to be screened for 
RNA binding activity and a second domain having 

3 0 antitermination activity. The second DNA segment is 

transcribed only to a limited extent. Transcription 
proceeds through the RNA binding site but is stopped by 
the termination site before it reaches the reporter gene. 
If the polypeptide being screened has a specific affinity 
35 for the RNA binding sequence included in the reporter 
system, the fusion protein binds to this sequence via the 
polypeptide being screened. The other portion of the 
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fusion protein with antiterminator activity releases the 
blockage of transcription caused by the termination site. 
Thus, transcription proceeds through the reporter gene, 
which is expressed. If the polypeptide being screened 
5 lacks a specific affinity for the RNA recognition 
sequence, transcription of the reporter gene remains 
blocked. The presence of the reporter gene product 
therefore indicates that the polypeptide under test has 
RNA binding activity for the RNA binding sequence 
10 included in the reporter system. 

III. RNA Bindincr Protein and Recognition Sequences 

Natural RNA binding proteins often have one domain 
responsible for RNA binding and other domains responsible 
15 for other functions. The domain responsible for RNA 
binding can sometimes be recognized by a characteristic 
motif. The most widely found RNA recognition sequence or 
binding motif is the RNP motif. The RNP motif is a 90- 
10 0 amino acid sequence that is present in one or more 

2 0 copies in proteins that bind pre mRNA, mRNA, pre- 

ribosomal RNA and snRNA. The consensus sequence and the 
sequences of several exemplary proteins containing the 
RNP motif are provided by Burd and Dreyfuss, supra. See 
also Swanson et al . , Trends Biochem. Sci . 13, 86 (1988); 
25 Bandziulis et al . , Genes Dev. 3,431 (1989); Kenan et al . , 
Trends Biochem. Sci. 16, 214 (1991) . The consensus motif 
contains two short consensus sequences RNP-1 and RNP-2. 
Some RNP proteins bind specific RNA sequences with high 
affinities (dissociation constant in the range of 10" 8 -10" 

3 0 11 M) . Such proteins often function in RNA processing 

reactions. ' Other RNP proteins have less stringent 
sequence requirements and bind less strongly 
(dissociation constant ~10~ 6 -10" 7 M) . Burd & Dreyfuss, 
EMBO J. 13, 1197 (1994). 
35 A second characteristic RNA binding motif found in 

viral, phage and ribosomal proteins is an arginine-rich 
motif (ARM) of about 10-20 amino acids. RNA binding 
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proteins having this motif include the HIV Tat and Rev 
proteins. Rev binds with high affinity disassociation 
constant (1CT 9 M) to an RNA sequence termed RRE, which is 
found in all HIV mRNAs . Zapp et al . , Nature 342, 714 
5 (1989); Dayton et al . , Science 246, 1625 (1989). Tat 
binds to an RNA sequence termed TAR with a dissociation 
constant of 5 x 10" 9 M. Churcher et al . , J. Mol . Biol. 
230, 90 (1993) . For Tat and Rev proteins, a fragment 
containing the arginine-rich motif binds as strongly as 
10 the intact protein. In other RNA binding proteins with 
ARM motifs, residues outside the ARM also contribute to 
binding. Other families of RNA binding proteins with 
different binding motifs are described by Burd and 
Dreyfyss, supra. 

15 

IV. Combinatorial Strategies 

In some embodiments, the methods of the invention 
are used to isolate RNA binding polypeptides to an RNA 
sequence known to bind a natural RNA binding protein. A 

2 0 selected RNA sequence is inserted into the second DNA 

segment (screening system) as described above and a 
library of polypeptides is inserted into the first DNA 
segment. The library of polypeptides can be a random 
library of short peptides about 6-25 amino acids long. 

25 The library can also constitute variant forms of a 
naturally occurring RNA binding protein. In this case, 
the naturally occurring RNA binding protein may or may 
not be the natural partner for the selected RNA binding 
site. The members of the library can be of similar 

30 length to a full-length naturally occurring RNA binding 
protein, or can be much shorter, including predominantly 
the RNA binding motif. If full-length proteins are to be 
screened, variant amino acids are concentrated in the RNA 
binding domain of the full-length protein. In some 

3 5 methods, all of the amino acids within an RNA binding 

domain are varied. In other methods, a framework of 
amino acids is kept constant, and only selected amino 
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acids varied. The framework is usually formed from amino 
acids that contribute to the ' global three dimensional 
structure of the protein but do not directly contact the 
target RNA molecule. Selected residues for variation are 
5 preferably those that directly contact the target or 
amino acids proximal to such amino acids. In some 
polypeptides, at least five residues are selected for 
variation. The scope of variation at each position can 
encompass all twenty amino acids or a more limited 

10 repertoire.- For example, for RNA binding proteins having 
an ARM motif, the repertoire might in some instances be 
limited to charged and hydrophilic residues. 

In other methods, polypeptides to be screened are 
obtained from natural cDNA or genomic libraries. Such 

15 libraries are inserted into the first DNA segment as 
described above. These methods are particularly useful 
for isolating cognate and allelic variants of known RNA 
binding proteins . 

The methods can also be used to identify RNA binding 

20 polypeptides to RNA sequences having no known binding 
protein. For example, one might want to isolate an RNA 
binding polypeptide to a unique RNA sequence that occurs 
in the RNA of a pathogen but does not occur in humans or 
other mammals. RNA sequences proximal to transcriptional 

25 or translation initiation sites are particular suitable. 
A DNA segment encoding the selected RNA sequence is 
cloned into the reporter system. A library of 

polypeptides, which can be random polypeptides or 
variants of a known RNA binding protein, is cloned into 

30 the first DNA segment and screened as described above. 

The methods are also useful for isolating antibodies 
with a specific affinity for a selected RNA sequence. 
Libraries from unimmunized human B cells are prepared 
according to the general protocol outlined by Huse et 

35 al., Science 246:1275-1281 (1989). The heavy and light 
chains can be screened individually or as a complex for 
binding activity. For individual screening, a library of 
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heavy (or light) chains is cloned into the first 
recombinant DNA segment in frame with the antiterminator 
protein. The library is then screened as for other 
libraries of potential RNA binding polypeptides. For 
5 screening a complex of both chains, the chains can be 
linked by a spacer and the combined heavy- spacer- light 
chain expressed as a fusion protein with the 
antiterminator protein. See Ladner, US 5,260,203. 
Alternatively, one of the chains can be expressed as a 
10 fusion protein with the antiterminator protein or 
fragment (as for any other polypeptide being screened) 
and the other chain expressed from a separate promoter, 
which may be on the same or a different vector as the 
first chain. 

15 In a further variation, the methods can be used to 

identify an RNA sequence that binds to a selected RNA 
binding protein. Such methods are useful, for example, 
in mapping the RNA binding site of the selected protein. 
In this situation, a DNA sequence encoding the selected 

2 0 protein (or the RNA binding domain thereof) is cloned 
into the first DNA segment (i.e., linked to the 
antiterminator domain) and a library of DNA encoding 
variable RNA segments is cloned into the second DNA 
segment (i.e., the reporter system) . The library can be 

25 random, contain variants of a selected consensus 
sequence, or can contain a family of sequences varying in 
a systematic fashion. For example, to map an RNA binding 
site within the context of a larger RNA sequence, one 
requires a series of overlapping oligonucleotides 

30 encoding fragments of the RNA sequence. 

Libraries are constructed by cloning an 
oligonucleotide which contains the variable region of 
library members (and any spacers and nonvariable 
framework determinants) into the selected cloning site. 

35 Using known recombinant DNA techniques (see generally, 
Sambrook et al . , Molecular Cloning, A Laboratory Manual , 
2d ed., Cold Spring Harbor Laboratory Press, Cold Spring 
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Harbor, N.Y., 1989, incorporated by reference in its 
entirety for all purposes) , an oligonucleotide may be 
constructed which, inter alia., removes unwanted 
restriction sites and adds desired ones, reconstructs the 
5 correct portions of any sequences which have been removed 
(such as a correct signal peptidase site, for example) , 
inserts the spacer conserved or framework residues, if 
any, and corrects the translation frame (if necessary) to 
produce a fusion protein. A portion of the 

10 oligonucleotide will generally contain one or more 
variable region domain (s) and the spacer or framework 
residues. The sequences are ultimately expressed as 
peptides (with or without spacer or framework residues) . 
The variable region domain of the oligonucleotide 

15 comprises the source of the library. The size of the 
library varies according to the number of variable 
codons, and hence the size of the peptides, which are 
desired. Generally the library will be at least about 10 4 
or 10 6 members, usually at least 10 7 , and typically 10 8 or 

20 more members. 

To generate the collection of oligonucleotides which 
forms a series of codons encoding a random collection of 
amino acids and which is ultimately cloned into the 
vector, a codon motif is used, such as (NNK) X , where N may 

25 be A, C, G, or T (nominally equimolar) , K is G or T 
(nominally equimolar), and x is typically up to about 5, 
6, 7, or 8 or more, thereby producing libraries of penta- 
, hexa-, hepta-, and octa-peptides or more. The third 
position may also be G or C, designated "S" . Thus, NNK 

3 0 or NNS (i) code for all the amino acids, (ii) code for 
only one stop codon, and (iii) reduce the range of codon 
bias from 6:1 to 3:1. The expression of peptides from 
randomly generated mixtures of oligonucleotides in 
appropriate recombinant vectors is discussed in Oliphant 

35 et al., Gene 44:177-183 (1986), incorporated herein by 
reference . 
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The codon motif (NNK) 6 produces 32 codons, one for 
each of 12 amino acids, two for each of five amino acids, 
three for each of . three amino acids and one (amber) stop 
codon. Although this motif produces a codon distribution 
5 as equitable as available with standard methods of 
oligonucleotide synthesis, it results in a bias against 
peptides containing one-codon residues. For example, a 
complete collection of hexacodons contains one sequence 
encoding each peptide made up of only one-codon amino 

10 acids, but contains 729 (3 6 ) sequences encoding each 
peptide with only three-codon amino acids. 

An additional codon motif useful for generating 
diversity in RNA binding proteins having an ARM motif is 
CAG/CAG/N (see Fig. IB) . This limited genetic code 

15 allows synthesis of all charged and hydrophilic amino 
acids and is enriched in arginine residues. Combinations 
of these amino acids are expected to encode a variety of 
helical and nonhelical arginine -rich RNA-binding 
peptides. Subsets of this restricted code may be devised 

2 0 to favor certain types of peptide structures and RNA 
interactions : boxed amino acids were used in the present 
peptide library experiment. Hydrophobic amino acids are 
excluded from the restricted code. An alternative 

approach to minimize the bias against one-codon residues 

25 involves the synthesis of 20 activated tri -nucleotides , 
each representing the codon for one of the 2 0 genetically 
encoded amino acids. These are synthesized by 

conventional means, removed from the support but 
maintaining the base and 5-HO-protecting groups, and 

30 activating by the addition of 3 ' O-phosphoramidite (and 
phosphate protection with b-cyanoethyl groups) by the 
method used for the activation of mononucleosides , as 
generally described in McBride and Caruthers, Tetrahedron 
Letters 22:245 (1983) , which is incorporated by reference 

35 herein. Degenerate "oligocodons" are prepared using 
these trimers as building blocks . The trimers are mixed 
at the desired molar ratios and installed in the 
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synthesizer. The ratios will usually be approximately 
equimolar, but may be a controlled unequal ratio to 
obtain the over- to under- representation of certain amino 
acids coded for by the degenerate oligonucleotide 
5 collection. The condensation of the trimers to form the 
oligocodons is done essentially as described for 
conventional synthesis employing activated 
mononucleosides as building blocks. See generally, 
Atkinson and Smith, Oligonucleotide Synthesis , M.J. Gait, 

10 ed. p35-82 (1984) . Thus, this procedure generates a 
population of oligonucleotides for cloning that is 
capable of encoding an equal distribution (or a 
controlled 'unequal distribution) of the possible peptide 
sequences. This approach may be especially useful in 

15 generating longer peptide sequences, since the range of 
bias produced by the (NNK) 6 motif increases by three-fold 
with each additional amino acid residue. 

When the codon motif is (NNK) x , as defined above, and 
when x equals 8, there are 2.6 x 10 10 possible octa- 

20 peptides. A library containing most of the octa-peptides 
may be difficult to produce. Thus, a sampling of the 
octa-peptides may be accomplished by constructing a 
subset library using of about .1%, and up to as much as 
1%, 5% or 10% of the possible sequences, which subset is 

25 then screened. As the library size increases, smaller 
percentages are acceptable. If desired, to extend the 
diversity of a subset library the recovered subset of 
sequences may be subjected to mutagenesis and then 
subjected to subsequent rounds of screening. This 

30 mutagenesis step may be accomplished in two general ways: 
the variable region of the recovered RNA binding 
polypeptides can be mutagenized, or additional variable 
amino acids may be added to the regions adjoining the 
initial variable sequences . 

35 A variety of techniques can be used to diversify a 

peptide library or to diversify around peptides found in 
early rounds of screening to have substantial specific 
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binding activity. In one approach, the positive RNA 
binding polypeptides are sequenced to determine the 
identity of the active peptides. Oligonucleotides are 
then synthesized based on these peptide sequences, 
5 employing a low level of all bases incorporated at each 
step to produce slight variations of the primary 
oligonucleotide sequences. This mixture of (slightly) 
degenerate oligonucleotides is then rescreened as 
described above. This method produces systematic, 
10 controlled variations of the starting peptide sequences. 

Another technique for diversifying around the 
recognition kernel of a selected RNA binding polypeptide 
involves the subtle misincorporation of nucleotide 
15 changes in the peptide through the use of the polymerase 
chain reaction (PCR) under low fidelity conditions. 
Alteration t of the ratios of nucleotides and the addition 
of manganese ions can produce a 2% mutation frequency. 
Yet another approach for diversifying the selected RNA 

2 0 binding polypeptides involves the mutagenesis of a pool, 

or subset, of recovered plasmids encoding polypeptides 
with binding activity. The plasmids are mutagenized by 
treatment with, e.g., nitrous acid, formic acid, or 
hydrazine. These treatments produce a variety of damage 
25 in the DNA. The damaged DNA is then copied with reverse 
transcriptase which misincorporates bases when it 
encounters a site of damage. The segment containing the 
sequence encoding the variable peptide is then isolated 
by cutting with restriction nuclease (s) specific for 

3 0 sites flanking the variable region. This mutagenized 

segment is then recloned and rescreened. See Myers et 
al., Nucl. Acids Res. 13:3131-3145 (1985), Myers et al . , 
Science 229:242-246 (1985), and Myers, Current Protocols 
in Molecular Biology Vol I, 8.3.1 - 8.3.6 (Ausebel et 
35 al., eds., Wiley, New York (1989)) (which are 
incorporated by reference in their entirety for all 
purposes) . 
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In the second general approach, that of adding 
additional amino acids to a peptide or peptides found to 
be active, a variety of methods are available. In one, 
the sequences of peptides selected after a first 
5 screening are determined individually and new 
oligonucleotides, incorporating the determined sequence 
and an adjoining degenerate sequence, are synthesized. 
These are then cloned to produce a secondary library. 

In another approach which adds a second variable 

10 region to a pool of plasmids encoding RNA binding 
polypeptides, a restriction site is installed next to the 
primary variable region. Preferably, the enzyme should 
cut outside of its recognition sequence, such as BspMI 
which cuts leaving a four base 5' overhang, four bases to 

15 the 3' side of the recognition site. Thus, the 
recognition site may be placed four bases from the 
primary degenerate region. To insert a second variable 
region, the pool of plasmid DNA is digested and blunt - 
ended by filling in the overhang with Klenow fragment. 

2 0 Double -stranded, blunt -ended, degenerately synthesized 

oligonucleotide is then ligated into this site to produce 
a second variable region juxtaposed to the primary 
variable region. This secondary library is then 

amplified and screened as before. 
25 While in some instances it is appropriate to 

synthesize peptides having contiguous variable regions to 
bind certain RNA sequences, in other cases it is 
desirable to provide peptides having two or more regions 
of diversity separated by spacer residues. For example, 

3 0 the variable regions may be separated by spacers which 

allow the diversity domains of the peptides to be 
presented to the receptor in different ways. The 
distance between variable regions may be as little as one 
residue, sometimes five to ten and up to about 100 
35 residues. For probing a large binding site the variable 
regions may be separated by a spacer of residues of 2 0 to 
3 0 amino acids. The number of spacer residues when 
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present will preferably be at least two, typically at 
least three or more, and often will be less than ten, 
more often less than eight residues. 

Unless modified during or after synthesis by the 
5 translation machinery, recombinant peptide libraries 
consist of sequences of the 20 normal L-amino acids. 
While the available structural diversity for such a 
library is large, additional diversity can be introduced 
by a variety of means, such as chemical modifications of 
10 the amino acids. For example, a peptide library can have 
its carboxy terminal amidated. See Eipper et al . , J. 
Biol. Chem. 266, 7827-7833 (1991). 

V. Screening Procedures 
15 After transformation of vector (s) into the host 

cells, the host cells are propagated in standard liquid 
or solid laboratory media to allow expression of the 
potential RNA binding polypeptides and the reporter 
plasmid. The method of screening depends on the reporter 

2 0 gene. If the reporter gene is /3-galactosidase, cells are 

screened for expression of the reporter gene by plating 
on X-gal media. Cells expressing the gene give rise to 
blue colonies. The intensity of blue is positively 
correlated with the extent of expression of (3- 
25 galactosidase, which is in turn positively correlated 
with the extent of binding of the potential RNA binding 
polypeptide contained within a colony. Thus, simple 
visual inspection of a plate gives some indication of the 
colonies containing the RNA binding polypeptides with 

3 0 strongest affinity. The extent of expression can be 

quantified more accurately by propagating liquid cultures 
from individual colonies on a plate and performing an 
ONPG assay on permeabilized cells (see Example 4) . 

In an analogous approach, when the reporter gene is 
3 5 a selectable gene such as CAT, colonies are plated on a 
selective media, and only colonies containing a 
polypeptide with specific affinity for the selected RNA 
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recognition sequence grow. Colony size provides a simple 
visual indication of the affinity of the RNA binding 
protein. Affinity can be quantified more accurately by 
measuring CAT levels in liquid culture. Use of a 
5 selectable reporter gene is advantageous in that colonies 
can be plated at higher density allowing screening of 
larger libraries. 

In a further approach, colonies are screened by FACS 
analysis. A fluorescent signal can be generated by 

10 treating cells containing a /3-galactosidase reporter gene 
with the substrate fluorescein di-/3-D-glactopyranoside, 
which breaks down to fluorescein. See Alvarex et al . , 
Biotechniques 15, 975 (1993) . Alternatively, the 
reporter gene can encode a fluorescent protein. The FACS 

15 method can screen large numbers of cells in liquid 
culture. A FACS machine can be programmed to isolate a 
fractionate of cells whose fluorescence exceeds a desired 
limit. These cells are those containing the polypeptides 
with the highest binding affinities. 

2 0 In all screening methods, plasmids encoding 

polypeptides showing binding activity on a first screen 
can be pooled, if desired, retransf ormed into host cells 
and rescreened by the same general approach. The 
variable portion of plasmids are then sequenced to 

25 determine the nucleic acid sequence (and the deduced 
amino acid sequence) of the RNA binding proteins 
identified by the screening. RNA binding proteins can 
then be produced by, for example, synthesizing synthetic 
olignucleotides encoding the RNA binding protein and 

30 expressing 'the same in cell culture. 

VI . Applications 

RNA binding polypeptides isolated by the methods 
described above have a variety of uses. In one 
35 application, RNA binding polypeptides are used in 
therapeutic methods to block the life-cycle of pathogenic 
microorganisms, including viruses, such as HIV, and 
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bacteria. Some synthetic RNA binding polypeptides are 
used as antagonists of a naturally occurring RNA binding 
protein. A synthetic polypeptide occupies the target 
site in competition with the natural protein without 
5 fulfilling the physiological role of the natural protein. 
The synthetic polypetpide thereby antagonizes the natural 
protein and aborts the life-cycle of a pathogenic 
microorganism. In such methods, the synthetic RNA 
binding polypeptide preferably has a higher binding 

10 affinity than the natural protein, and lacks functional 
domains (other than the binding domain) present in the 
natural protein. Other RNA binding polypeptides bind 
unique sequences on the pathogen's mRNA for which there 
may be no naturally occurring RNA binding protein. These 

15 polypeptide interfere with replication or translation of 
the pathogenic microorganism. For example, the RNA 
binding protein can occlude the Shine -Delgarno sequence 
or initiation codon of a bacterial mRNA thereby 
preventing translation. In mammalian diseases resulting 

2 0 from impairment or loss of a natural RNA binding protein, 

treatment with an exogenous RNA binding protein or an 
analog that substitutes for, or agonizes a natural 
protein serves to ameliorate the disease. Some of these 
synthetic polypeptides possess both an RNA binding 
25 protein and a functional domain also present in the 
naturally occurring protein. 

The RNA binding proteins isolated by the methods 
also serve as lead compounds for the development of 
derivative compounds. The derivative compounds can 

3 0 include chemical modifications of amino acids or replace 

amino acids with chemical structures . The analogs should 
have a stabilized electronic configuration and molecular 
conformation that allows key functional groups to be 
presented to the RNA binding site in substantially the 
35 same way as the lead peptide. In particular, the non- 
peptidic compounds will have spatial electronic 
properties which are comparable to the polypeptide 
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binding region, but will typically be much smaller 
molecules than the polypeptides, frequently having a 
molecular weight below about 2 kD and preferably below 
about 1 kD. 

5 Identification of such non-peptidic compounds can be 

performed through use of techniques known to those 
working in the area of drug design. Such techniques 
include, but are not limited to, self -consistent field 
(SCF) analysis, configuration interaction (CI) analysis, 

10 and normal mode dynamics analysis. Computer programs for 
implementing these techniques are readily available. See 
Rein et al . , Computer-Assisted Modeling of Receptor- 
Ligand Interactions (Alan Liss, New York, 1989) . 

RNA binding proteins or analogs are formulated for 

15 therapeutic use as pharmaceutical compositions. The 
compositions may also include, depending on the 
formulation desired, pharmaceutically-acceptable, non- 
toxic carriers or diluents, which are defined as vehicles 
commonly used to formulate pharmaceutical compositions 

20 for animal or human administration. The diluent is 
selected so as not to affect the biological activity of 
the combination. Examples of such diluents are distilled 
water, physiological saline, Ringer's solutions, dextrose 
solution, 'and Hank's solution. In addition, the 

25 pharmaceutical composition or formulation may also 
include other carriers, adjuvants, or nontoxic, 
nontherapeutic, nonimmunogenic stabilizers and the like. 

The RNA binding polypeptide isolated by the methods 
are also useful in diagnostic methods. For example, an 

3 0 RNA binding polypeptide with a specific affinity for an 
RNA sequence encoded by a pathogenic microorganism can be 
used to detect the microorganism. In one assay format, 
the polypeptide is immobilized to a support, optionally 
via a linker, and a sample, which may or may not contain 

3 5 RNA from the microorganism, is contacted with the 
support . Bindings of the RNA from the microorganism to 
the support can be detected by competition with binding 
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of a labelled synthetic RNA recognition sequence to the 
immobilized RNA binding polypeptide. RNA binding 

polypeptides are also useful in controlling the growth of 
cells in culture. 

5 

VII. Kits 

The invention also provides kits useful for the 
screening methods. The kits contain the first and second 
DNA segments described in section II above, cloned into 

10 the same or separate vectors. The kits may also contain 
chemicals for performing a screen, such as X-gal, and 
primers suitable for sequencing the vectors. The kits 
usually include labelling or instructions indicating the 
suitability of the kits for screening DNA binding 

15 proteins and indicating how the vector (s) are to be used 
for that purpose. The term "label" is used generically 
to encompass any written or recorded material that is 
attached to, or otherwise accompanies the diagnostic at 
any time during its manufacture, transport, sale or use. 

20 The following examples are provided to illustrate 

but not to limit the invention. 

EXAMPLES 

Example 1: Hybrid antiterminator protein 

In this example, a pBR322 -derived vector was 

25 constructed encoding a hybrid protein in which the 19- 
amino acid N- terminal RNA-binding sequence of the phage 
X N protein was replaced by an arginine-rich putative RNA 
binding polypeptide from one of the following eucaryotic 
proteins, HIV RRE, BIV TAR, or HIV TAR. 

30 Synthetic oligonucleotide cassettes encoding HIV Rev 

( m a T R Q A R R N R R R R R W R R - a a a a n ) , BIV Tat 

( mgRPRGTRGKGRR I RRgggn ) , and HIV Tat (mRKKRRQRRR) peptides 
were cloned into the unique BsmI and Ncol sites of 
pBRptacN* (Franklin, J. Mol Biol 231, 343 (1993)), 

3 5 creating fusion proteins at amino acid 2 0 of the X N 
protein. The sequence encoding the fusion protein was 
linked to a tac promoter. 
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A second pACYC-derived vector was constructed 
encoding a -tac promoter linked to the phage X termination 
site Nut and a /3-galactosidase structural gene. See 
Franklin, J. Mol Biol 231, 343 (1993) . Oligonucleotides 
5 containing box A of the Nut site and the appropriate RNA 
hairpin (Fig. 1A) in place of Box B were cloned into the 
unique PstI and BamHI sites of p ACnu t TAT 1 3 (Id.), 
replacing the existing X Nut site. Two additional GC 
base pairs were added at the ends of each hairpin stem. 

10 Plasmids were transformed into E. coli strain N567 

(Franklin & Doel ling, J. Bacterid. 171, 2513 (1989)) and 
bacteria were grown on LB plates or in tryptone broth 
containing 50 mg/1 ampicillin and/or 15 mg/1 
chloramphenicol. In the /?-galactosidase colony color 

15 assay, the number of +s represents visual estimation of 
blue intensity after growing colonies on plates 
containing- 0.08 mg/ml X-gal and 0.024 mg/ml IPTG (to 
induce the tac promoters) for -48 hr at 34 °C. (3- 
galactosidase activity was also measured in permeabilized 

20 cells using an ONPG colorimetric assay (Sambrook et al . 
Molecular Cloning (Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, 1989)) . Bacteria were grown at 34°C 
to early log phase, IPTG was added to 0.5 mM, and cells 
were grown for 1 additional hr to OD 600 = 0 . 4 - 0 . 5. The 

25 activity measured on plates and in permeabilized cells 
correlates roughly as follows: +++++, 200 units /?- 
galactosidase/1 OD 600 unit cells; ++++, 100-200 units; +++ 
10-100 units; ++, 2-10 units; +, 1-10 units; 
background . 

30 Ant i termination was observed only with specific 

peptide-RNA interactions (Table 1) . The activities of 
HIV Rev and BIV Tat-N fusion proteins on their respective 
reporters were lower than wild- type N on the Nut reporter 
but were well above background levels . The plate assay 

3 5 appears to be particularly sensitive (see below) . The 
lack of activity of the HIV Tat-N fusion protein on the 
HIV TAR reporter is likely because additional cellular 
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factors are needed for high affinity binding. Jones & 
Peterlin, Annu. Rev. Blochem. 63, 717 (1994). Western 
analysis using ant.i-N polyclonal antiserum indicated that 
steady- state expression levels of each N fusion protein 
were slightly below that of wild-type N. 

Table 1: 

Antitermination by N proteins fused to 
heterologous arginine-rich peptides. 



Reporter 
Nut" 



Nut 



HIV RRE 



BIV TAR 



HIV TAR 



N- Fusion 
N 

Rev 

BIV Tat 
HIV Tat 
N 

Rev 

BIV Tat 
HIV Tat 
N 

Rev 

BIV Tat 
HIV Tat 
N 

Rev 

BIV Tat 
HIV Tat 
N 

Rev 

BIV Tat 
HIV Tat 



X-qal 



+++++ 



+++ 



+++ 



ONPG 
1.9 
1.1 
0 . 6 
0.7 

1020 
1.0 
1 . 1 
0.7 
2.5 
17 
0.4 
0.6 
2.9 
1.7 
21 
0 . 8 
2.4 
1. 
0. 
0 . 



5 
6 
9 



.Nut and Nuf (a Nut site deletion) reporters and the N- 
expressing plasmid are described in Franklin, «J. Mol Biol 
231, 343 (1993) . 

In summary, this example shows that a hybrid 
ant i terminator protein containing a phage X N protein 
anti-terminator domain linked to a foreign RNA binding 
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polypeptide is functional when bound to RNA via the 
foreign polypeptide. The hybrid protein exhibits 

antitermination activity on a termination site proximal 
to the RNA recognition sequence bound by the foreign 
5 polypeptide resulting in expressing of the reporter gene. 

Example 2 : Use of linkers to enhance display of RNA 
binding polypeptides 

This example determines the most favorable local 

10 context in which to display a helical or nonhelical 
peptide. HIV Rev and BIV Tat peptides were fused into 
the X N protein as described above (i.e., with the 
inserted peptide replacing amino acids 1-19 of the N 
protein) except that the N-protein and peptide domains 

15 were separated by a linker of four alanines or three 
glycines. The alanine linker increased activity of the 
Rev-N fusion protein whereas the glycine linker decreased 
activity (Fig. 2) . The increase may be due to increase 
helicility imparted by the linker. Both HIV Rev and X N 

2 0 proteins require a-helical conformations to bind 

specifically to their RNA sites. See Tan et al . , Cell 
73, 1031 (1993); Oubridge et al . , Nature 372, 432 (1994). 
However, addition of a second alanine linker to the N- 
terminus of the fusion protein reduced activity, 
25 suggesting that factors other than peptide helicity can 
influence qintitermination activity. For display of the 
BIV Tat peptide, a glycine linker between the peptide and 
the N-protein provided the most favorable context for 
display (Fig. 2) . Additional experiments indicated that 

3 0 Rev and BIV Tat peptides could be shortened to 14 amino 

acids, from previously used 17-amino acid versions, with 
little effect on activity (Fig. 2) . Thus, peptide 
library experiments described below were performed with 
14 randomized positions and either alanine or glycine 
3 5 linkers. 
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Example 3 : Random Mutagenesis of a single •position in an 
RNA binding protein 

The capacity, of the system to distinguish between 
different RNA-binding affinities was tested by creating 
5 a small BIV Tat peptide library in which the codon for 
isoleucine at position 79 was randomized. The 
hydrophobicity of this amino acid is important for 
binding to BIV TAR, and in vivo activities of 15 
substitution mutants in mammalian cells are known. The 

10 library was screened on the BIV TAR reporter and plasmids 
were sequenced from 77 colonies displaying a variety of 
blue intensities. All 15 dark blue (+++) colonies 
encoded large hydrophobic residues (I, Y, F,and L) , 29 
medium blue (++) colonies generally encoded smaller 

15 hydrophobic or uncharged residues, and 31 light blue (+) 
or background (-) colonies generally encoded amino acids 
with charged or small side chains. Antitermination 
activities determined by /3-galactosidase assays 
correlated qualitatively with binding activities 

2 0 determined in HeLa cells using HIV-LTR CAT reporter 

(Fig. 3). See Chen & Frankel, Biochemistry 33, 2708 
(1994) . 

Example 4 : Screening a Combinatorial Library of RNA 
25 binding Polypeptides 

This example shows the feasibility of isolating 
specific RRE-binding peptides from a combinatorial 
library in which each position in the Rev binding domain 
is varied. A library was constructed with the known 

3 0 requirements for Rev binding in mind: TRQARRNRRRRWRR in 

an a-helical context with important residues in bold; 
remaining residues can be replaced individually by 
lysines or alanines. Tan et al . , Cell 13, 1031 (1993); 
Tan & Frankel, Biochemistry 33, 14579 (1994) . A 14-mer 
35 peptide library containing any one of arginine, serine, 
asparagine, or histidine at each position (RSNH library; 
Fig. IB) was constructed in the context of an alanine 
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linker, and the proportion of each amino acid was 
adjusted to favor arginine . 

The library was encoded by a degenerate 
oligonucleotide having the formula: 5 ' -AGGAGAATCCCCATG- 
5 GCC (XYT) 14 GCAGCTGCGGCGAATGCAGCAAATCCCCTG-3 ' , where X is a 
C:A mixture at a 3 : 1 ratio and Y is an A: G mixtures at a 
1:3 ratio. Each randomized codon (XYT) encodes R : S, H 
: N at a ratio of 56.25% : 18.75% : 6.25%. A primer (5'- 
CAGGGGATTTG-CTGCATTC-3 ' ) was annealed to the degenerate 

10 oligonucleotide and double -stranded DNA was synthesized 
using Sequenase 2.0 (USB). The DNA was cloned into the 
BsmI and Ncol sites of pBRptacN* . 

The RSNH library was transformed into cells 
containing an RRE reporter vector. -600, 000 colonies 

15 (0.2% of the library) were screened, and 1920 visibly 
blue colonies were picked. To eliminate false positives 
(frequency about 0.5%), N-expressing plasmids were 
purified from pooled blue colonies, transformed into 
cells containing an RRE reporter vector and rescreened. 

2 0 Plasmids were then gel -purified from individual blue 
colonies to remove the reporter plasmid and were screened 
with RRE and BIV TAR reporter cells to identify N fusion 
proteins specific for the RRE. The majority (-85%) of In- 
fusion plasmids in this screen exhibited nonspecific 

25 ant i termination activity, showing at least some activity 
on both RRE and BIV TAR reporters. 

Sequences of 19 RRE- specific clones were determined 
and four unique sequences were found (Table 2) . To 
eliminate false positives that may have arisen from 

30 mutations outside the cloned peptide region, 
oligonucleotides encodinge selected positive peptides 
were synthesized, recloned into pBRptacN*, and plasmids 
were retested with RRE and BIV TAR reporter cells. All 
showed RRE-specific activity. 
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Table 2 : 

Ant i termination activities of the 
HIV Rev peptide andselected clones 



sequence 
TRQARRNRRRRWRR 
SRQARRNRRRRWRR 



RRE 
X-qal ONPG 



+++ 
+++ 



66 
50 



RRE" 
X-qal ONPG 

7.5- 

6.8 



BIV TAR 
X-qal ONPG 

6.6 

6.9 



10 



clone 24 SRSSRRNRRRRRRR +++ 67 

clone 3 9 SRSRRRNRRRRRRR ++ 54 



8.8 
13 



8.7 
9.5 



clone 41 NHRRRRRHRRRRRR + 3.5 

clone 57 NHRRRRRQRRRRRR ++ 12 



3.2 
2.5 



2.7 
3.1 



15 



20 



BIV Tat 

mutant 41-1 
mutant 57-1 
mutant 57-2 
mutant 57-3 
mutant 57-4 



NSRRRRRHRRRRRR 
NSRRRRRQRRRRRR 
NRRRRRRQRRRRRR 
NHRRRRRNRRRRRR 
RHRRRRRQRRRRRR 



1.4 

2.3 
7.1 
1.4 
2 . 7 
1.4 



1.2 

2.4 
2.4 
1 . 0 
2.8 
1.0 



35 

2.5 
1.7 
1.1 
2.4 
1.2 



The RRE" reporter contains a G46:C74 to C:G base pair substitution 
that markedly reduces Rev peptide binding affinity (10) . Bold amino 
acids in Rev are important for binding, and analogous residues in the 
Rev-like clones 24 and 39 are indicated. Mutations in the non-Rev- 
25 like clones -41 and 57 are underlined. 

30 



Two of the four selected peptides (clones 24 and 39) 
were Rev- like (SRxxRRNxxxRxxx) and exhibited specific 
antitermination activities comparable to the wild- type 
Rev peptide (Table 2) . In both peptides, arginines were 
4 0 found at all positions in the C- terminal half of the 
peptide, suggesting that a high charge density may be 
important for binding. The two remaining peptides 
(clones 41 and 57) did not match the Rev consensus 
sequence, and clone 57 contained a glutamine residue that 
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apparently arose from mutation of a histidine codon. The 
two non-Rev-like peptides exhibited weak but specific 
antitermination activities (Table 2) . the activity of 
clone 41 was clearly detectable only using the colony 
5 color assay. At low activities, the colony color assay 
appears to be more sensitive than the ONPG assay, 
presumably because colony color reflects /3-galactosidase 
activity accumulated during 4 8 hr of growth. The level 
of activity in the ONPG assay reflects activity after 
10 1 hr of induction and depends on the rate of N- fusion 
protein synthesis, which may differ significantly between 
clones . 

Because the spacing of non-arginine residues in the 
two non-ReV-like peptides was similar to the spacing of 
15 serine and asparagine in Rev (clones 41 and 57 have 
histidines at position 2 and either histidine or 
glutamine at position 8) , the activities of several 
mutants were tested to assess whether the mode of binding 
might be related to Rev. The identities of the non- 
20 arginine side chains were found to be important for 
binding (Table 2) and different from the side chain 
requirements in Rev. In clone 57 glutamine could not be 
replaced by asparagine, while asparagine appears to be 
important at the N-terminus. Therefore, the binding 
25 modes appear to be distinct from Rev. 

To confirm that the antitermination activities 
measured in vivo accurately reflect RNA-binding 
properties of the peptides, binding affinities and 
specificities of corresponding synthetic peptides were 
3 0 measured in vitro. Peptides were synthesized on an 
Applied Biosystems Model 432A peptide synthesizer and 
purified as described by Chen & Frankel, Biochemistry 33, 
2708 (1994) . All peptides were capped by a succinyl 
group at the N-terminus and by four alanines and an amide 
35 group at the C-terminus. Peptide concentrations were 
determined by tryptophan absorbance or by peptide 
absorbance using known peptides as standards. The purity 
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and concentrations of peptides were confirmed by native 
gel electrophoresis (10% polyacrylamide in 3 0 mM sodium 
acetate, pH 4.5).. Peptide molecular weights were 

confirmed by electrospray mass spectrometry (University 
5 of Michigan Protein and Carbohydrate Structure Facility) . 
RNAs were transcribed in vitro using T7 RNA polymerase 
[Milligan & Uhlenbeck, Methods Enzymoly 180, 51 (1989)] 
and labelled with [a- 32 P] CTP (NEN, 3000 ci/mmol) . RNAs 
were purified and concentrations were determined as 

10 described Chen & Frankel, Biochemistry 33, 2708 (1994). 
RNA-binding gel shift assays were carried out by 
incubating peptide and RNA at 4°C in 10 fil binding 
mixtures containing 10 mM HEPES-KOH, pH 7 . 5 100 mM KC1, 
1 mM MgCl 2/ 0 . 5 mM EDTA, 1 mM dithiothreitol , 50 fig /ml 

15 tRNA, and 10% glycerol. To determine relative binding 
affinities, 1-5 nM radiolabeled RNAs were titrated with 
peptide, and peptide-RNA complexes were resolved on 10% 
polyacrylamide, 0 . 5x TBE gels that had been prerun for 
1 hr and allowed to cool to 4°C. 

20 The results from the in vitro binding assays 

correlated well with the ant i termination assay: the two 
Rev- like peptides specifically bound the RRE with 
affinities comparable to the wild- type Rev peptide, the 
clone 57 peptide bound with a moderate preference for the 

25 RRE, and the clone 41 peptide bound with only a very 
slight preference for the RRE (Fig. 4A) . 

Circular dichroism was used to assess whether the 
selected peptides adopted a-helical conformations. 
Circular dichroism spectra were measured using an Aviv 

30 model 62DS spectropolarimeter . Samples were prepared in 
10 mM potassium phosphate buffer, pH 7.5 and 10 0 mM KF. 
Spectra were recorded using a 1 cm pathlength cuvette at 
4°C and signal was averaged for 5 sec at each wavelength. 
Scans were repeated five times and averaged. Mean 

3 5 molecular ellipticity was calculated per amino acid 
residue and helical content was estimated from the value 
at 222 nm (Chen et al . , Biochemistry 13, 3350 (1974) . As 
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shown in Fig. 4B, the 14 amino acid wild- type Rev peptide 
was somewhat less helical than the previously used 17- 
amino acid version (11% versus 43%) , and the selected 
Rev- like peptides were even less helical (5-6%) . The 
5 non-Rev- like peptides showed very little helix formation, 
probably explaining the marginal in vitro binding 
specificities for the RRE. The non-Rev-like peptides may 
be slightly more helical in the context of the N fusion 
proteins in vivo and therefore able to display some 

10 specific ant i termination activity. 

This example shows that the disclosed screening 
method can select RNA binding polypeptides having 
substantial specific binding affinity from a 
combinatorial library . 

15 Although the foregoing invention has been described 

in detail for purposes of clarity of understanding, it 
will be obvious that certain modifications may be 
practiced within the scope of the appended claims. All 
publications and patent documents cited above are hereby 

2 0 incorporated by reference in their entirety for all 
purposes to the same extent as if each were so 
individually denoted. 
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(A) MEDIUM TYPE: Floppy disk 

(B) COMRUTER: IBM PC compatible 
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(2) INFORMATION FOR SEQ ID N0:1: 

<1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: 



Thr Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID NO:2: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (primer) 

Cxi) SEQUENCE -DESCRIPTION: SEQ ID NO: 2: 
CAGGGGATTT GCTGCATTC 19 



(2) INFORMATION FOR SEQ ID NO:3: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 

Ser Arg Gin Ala Arg Arg Asn Arg Arg Arg Arg Trp Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Ser Arg Ser Ser Arg Arg Asn Arg Arg Arg Arg Arg Arg Arg 
1 5 10 
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(2) INFORMATION FOR SEQ ID N0:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

Ser Arg Ser Arg Arg Arg Asn Arg Arg Arg Arg Arg Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 

Asn His Arg Arg Arg Arg Arg His Arg Arg Arg Arg Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID N0:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 



Asn His Arg Arg Arg Arg Arg Gin Arg Arg Arg Arg Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID N0:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 

Asn Ser Arg Arg Arg Arg Arg His Arg Arg Arg Arg Arg Arg 
1 5 10 
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(2) INFORMATION FOR SEQ ID N0:9: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



CxiO SEQUENCE DESCRIPTION: SEQ ID N0:9: 



Asn Ser Arg Arg Arg Arg Arg Gin Arg Arg Arg Arg Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 



Asn Arg Arg Arg Arg Arg Arg Gin Arg Arg Arg Arg Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID N0:11: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 14 amino acids 
CB> TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: 



Asn His Arg Arg Arg Arg Arg Asn Arg Arg Arg Arg Arg Arg 
1 5 10 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Arg His Arg Arg Arg Arg Arg Gin Arg Arg Arg Arg Arg Arg 
1 5 10 
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(2) INFORMATION FOR SEQ ID NO: 13: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

<ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Ser Arg Xaa Xaa Arg Arg Asn Xaa Xaa Xaa Arg Xaa Xaa Xaa 
1 5 10 



(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 30 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
UCUGGGCGCA GCGCAAGCUG ACGGUACAGA 30 



(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Asp Ala Gin Thr Arg Arg Arg Glu Arg Arg Ala Glu Lys Gin Ala 
15 10 15 

Gin Trp Asn 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: RNA 
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(XI) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
GGCCUGAAGA AGGCC 15 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Arg Pro Arg Gly Thr Arg Gly Lys Gly Arg Arg He Arg Arg 
1 5 10 

Gin Trp Asn 



(2) INFORMATION FOR SEQ ID NO: 18: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
UCGUGUAGCU CAUUAGCUCG A 21 



(2) INFORMATION FOR SEQ ID NO: 19: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Arg Lys Lys Arg Arg Gin Arg Arg Arg 
1 5 



(2) INFORMATION FOR SEQ ID NO:20: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: RNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
AGAUCUGAGC CUGGGAGCUC UCU 23 
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What is claimed is : 

1. A method of screening a plurality of 
polypeptides for RNA binding activity, the method 

5 comprising: 

(1) culturing a library of cells, each cell 
comprising at least one vector comprising: 

a first DNA segment encoding a fusion protein 
comprising a fragment of an ant i -terminator protein 

10 having anti- terminator activity linked in-frame to a 
polypeptide under test which varies between cells in the 
library, arid 

a second DNA segment encoding in operable 
linkage a promoter, an RNA recognition sequence foreign 

15 to the anti -terminator protein, a transcription 
termination site and a reporter gene, wherein the 
termination site blocks transcription of the reporter 
gene in the absence of a protein with anti -termination 
activity and affinity for the RNA recognition sequence; 

2 0 whereby the first DNA segment is expressed to 

yield the fusion protein, which, if the polypeptide under 
test has a specific affinity for the recognition 
sequence, binds via the polypeptide to the RNA 
recognition sequence of a transcript from the second DNA 
25 segment thereby inducing transcription of the second DNA 
segment to* proceed through the termination site to the 
reporter gene resulting in expression of the reporter 
gene ; 

(2) detect ihg expression of the reporter gene in a 

3 0 cell from the library, the expression indicating that the 

cell comprises a polypeptide having RNA binding activity. 

2. The method of claim 1, further comprising 
isolating the cell expressing the reporter gene. 

35 
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3. The method of claim 2, wherein the cell line is 
procaryotic . 

4 . The method of claim 3 , wherein the 
5 antiterminator protein is a phage antiterminator protein. 

5. The method of claim 4, wherein the second DNA 
segment further encodes a Box A sequence operably linked 

10 to the RNA recognition sequence and transcription 
termination site. 

6. The method of claim 5, wherein the cell line 
comprises first and second vectors, the first vector 

15 comprising the first DNA segment and the second vector 
comprising the second DNA segment. 

7. The method of claim 1, wherein the first DNA 
segments comprise fragments from a cDNA or genomic 

20 library encoding the different polypeptides. 

8. The method of claim 7, further comprising the 
step of incorporating the fragments into the vector. 

25 9. The method of claim 1, wherein the different 

polypeptides vary in at least five amino acid positions. 

10. The method of claim 9, wherein at least one 
position can be occupied by any one of at least four 

3 0 amino acids. 

11. The method of claim 1, wherein the polypeptide 
contains 6-25 amino acids. 
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12. The method of claim 1, wherein there are at 
least 10 8 different polypeptides. 

13. The method of claim 1, wherein the reporter 
5 gene is a selectable gene. 

14. The method of claim 13, further comprising 
propagating the cell line in a selectable medium to 
select for a cell containing a polypeptide under test 

10 with a specific affinity for the RNA recognition 
sequence . 

15. The method of claim 14, further comprising 
propagating clonal colonies on a culture plate from the 

15 procaryotic cell culture. 

16. The method of claim 1, wherein the rate of 
expression correlates with the specific affinity of the 
peptide for the RNA recognition sequence. 

20 

17. The method of claim 3, wherein the cell line is 
E. coli. 

18. The method of claim 4, wherein the anti- 
25 terminator protein is the phage X N protein. 

19. The method of claim 18, wherein the polypeptide 
is a variant of the HIV Rev protein and the RNA 
recognition sequence is the HIV RRE sequence. 

30 

20. The method of claim 1, wherein expression is 
detected by FACS. 

21. A method of screening a polypeptide for RNA 
35 binding activity, the method comprising: 
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(1) culturing a procaryotic cell line comprising at 
least one vector comprising: 

a first DNA segment encoding a fusion protein 
comprising an anti-terminator protein or a fragment 
5 thereof having ant i- terminator activity linked in- frame 
to a polypeptide under test, and 

a second DNA segment encoding in operable 
linkage a promoter, an RNA recognition sequence foreign 
to the ant i- terminator protein, a transcription 

10 termination site and an expression product of a reporter 
gene, wherein the termination site blocks transcription 
of the reporter gene in the absence of a polypeptide with 
anti-termination activity and specific affinity for the 
RNA recognition sequence; 

15 whereby the first DNA segment is expressed to 

yield the fusion protein, which, if the polypeptide under 
test has a specific affinity for the RNA recognition 
sequence, binds via the polypeptide to the RNA 
recognition sequence of a transcript from the second DNA 

2 0 segment thereby inducing transcription of the second DNA 

segment to proceed through the termination site to the 
reporter gene resulting in expression of the reporter 
gene ; 

(2) detecting the expression to indicate that the 
25 polypeptide has specific affinity for the RNA recognition 

sequence . 

22 . A method of screening a library of RNA for 
binding to a selected polypeptide, the method comprising: 

3 0 (1) culturing a library of cells, each cell 

comprising at least one vector comprising 

a first DNA segment encoding a fusion protein 
comprising a fragment of a anti- terminator protein having 
ant i -terminator activity linked in-frame to a selected 
3 5 polypeptide; 
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a second DNA segment encoding in operable 
linkage a promoter, an RNA sequence varying between 
different cells in the library, a termination site and a 
reporter gene, wherein the termination site blocks 
5 transcription of the reporter gene unless the RNA 
sequence has a specific affinity for the selected 
polypeptide ; 

whereby the first DNA segment is expressed to yield 
the fusion protein, which, if the RNA sequence has a 

10 specific affinity for the selected polypeptide, binds via 
the selected polypeptide to the RNA sequence of a 
transcript from the second DNA segment thereby inducing 
transcription of the second DNA segment to proceed 
through the termination site to the reporter gene 

15 resulting in expression of the reporter gene; 

(2) detecting expression of the reporter gene in a 
cell from the library, the expression indicating that the 
cell comprises an RNA sequence having affinity for the 
polypeptide . 

20 

23 . A kit for screening peptides for RNA binding 
activity, comprising first and second DNA segments of 
claim 1. 
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FIG. 1A 
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FIG. 3 
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